1
Beyond Basic Search: Addressing the Limitations of Semantic Similarity
AI010 Lesson 8
00:00

Beyond Similarity

The "80% Problem" occurs when basic semantic search works for simple queries but fails on edge cases. When we search by similarity alone, the vector store often returns the most numerically similar chunks. However, if those chunks are nearly identical, the LLM receives redundant information, wasting the limited context window and missing broader perspectives.

Advanced Retrieval Pillars

  1. Maximum Marginal Relevance (MMR): Instead of just picking the most similar items, MMR balances relevance with diversity to avoid redundancy.
    $$MMR = \text{argmax}_{d \in R \setminus S} [\lambda \cdot \text{sim}(d, q) - (1 - \lambda) \cdot \max_{s \in S} \text{sim}(d, s)]$$
  2. Self-Querying: Uses the LLM to transform natural language into structured metadata filters (e.g., filtering by "Lecture 3" or "Source: PDF").
  3. Contextual Compression: Shrinks retrieved documents to extract only the "high-nutrient" snippets relevant to the query, saving tokens.
The Redundancy Trap
Providing the LLM with three versions of the same paragraph doesn't make it smarter—it just makes the prompt more expensive. Diversity is key to a "high-nutrient" context.
retrieval_advanced.py
TERMINAL bash — 80x24
> Ready. Click "Run" to execute.
>
Knowledge Check
You want your system to answer "What did the instructor say about probability in the third lecture?" specifically. Which tool allows the LLM to automatically apply a filter for { "source": "lecture3.pdf" }?
ConversationBufferMemory
Self-Querying Retriever
Contextual Compression
MapReduce Chain
Challenge: The Token Limit Dilemma
Apply advanced retrieval strategies to solve a real-world constraint.
You are building a RAG system for a legal firm. The documents retrieved are 50 pages long, but only 2 sentences per page are actually relevant to the user's specific query. The standard "Stuff" chain is throwing an OutOfTokens error because the context window is overflowing with irrelevant text.
Step 1
Identify the core problem and select the appropriate advanced retrieval tool to solve it without losing specific nuances.
Problem: The context window limit is being exceeded by "low-nutrient" text surrounding the relevant facts.

Tool Selection: ContextualCompressionRetriever
Step 2
What specific component must you use in conjunction with this retriever to "squeeze" the documents?
Solution: Use an LLMChainExtractor as the base for your compressor. This will process the retrieved documents and extract only the snippets relevant to the query, passing a much smaller, highly concentrated context to the final prompt.